Forecasting Anomalies in AtHub’s Stock Behavior

INFO 523 - Final Project

Project description
Author
Affiliation

Annabelle Zhu

College of Information Science, University of Arizona

Abstract

This project investigates whether abnormal price and volume fluctuations in AtHub (603881.SH)—a Chinese data center infrastructure firm—can be predicted using technical analysis (TA) features. We define volatility anomalies as daily returns exceeding ±5% or volume surges exceeding twice the 30-day rolling average. Drawing on over 30 engineered TA indicators spanning momentum, trend, volume, and volatility categories, we construct a supervised learning pipeline to forecast next-day anomalies. The model is evaluated using time-aware cross-validation and interpreted through SHAP analysis to reveal leading patterns and feature contributions. Results suggest that certain TA combinations (e.g., high RSI with declining OBV) consistently precede large movements, demonstrating the potential of interpretable, data-driven tools for anomaly detection in high-volatility equities.


Introduction

Predicting sudden shifts in equity price or trading volume is a long-standing challenge in financial forecasting, particularly for high-volatility stocks sensitive to external shocks. This project centers on AtHub (603881.SH), a stock known for its erratic short-term behavior and policy-driven sensitivity, to assess whether machine learning models can detect early signs of abnormal market activity. Unlike traditional models that aim to forecast precise price levels, our approach reframes the task as a binary classification problem focused on identifying rare but impactful events. We rely exclusively on market-based features—technical indicators derived from historical prices and volumes—to build a predictive framework that aligns with real-world constraints where external signals (e.g., news sentiment, fundamentals) may be unavailable or delayed. By integrating explainable AI methods into the model workflow, this project also emphasizes transparency and trustworthiness in financial ML applications.


Research Questions

  • Q1. Can TA features detect anomalies 1–3 days in advance? Which indicators lead?

  • Q2. Which features drive predictions? Do they align with financial theory?

  • Q3. How do anomaly thresholds (\(\pm\) 3% vs. \(\pm\) 5% vs. \(\pm\) 7% price; 1.8 \(\times\) vs. 2.5\(\times\) volume) impact model performance?


Exploratory Analysis

Loading and Initial Preparation

Total observations: 375
Number of Columns: 31

Target Variable Engineering

Define the binary target: will there be an anomaly tomorrow?

To better understand the imbalance in the target variable, we plot the proportion of anomaly vs. normal days. An anomaly day is defined as either a \(\pm\) 5% price change or a volume spike above twice the 30-day moving average. The bar chart highlights the class imbalance, a common challenge in financial anomaly detection.

Class Distribution of Target Labels

Data-cleaning

Missing values per column:
ts_code               0
open                  0
high                  0
low                   0
close                 0
pct_chg               0
vol                   0
amount                0
volume_obv            0
volume_cmf            0
volume_vpt            0
volume_vwap           0
volume_mfi            0
volatility_bbw        0
volatility_atr        0
volatility_ui         0
trend_macd            0
trend_macd_signal     0
trend_macd_diff       0
trend_adx             0
trend_adx_pos         0
trend_adx_neg         0
momentum_rsi          0
momentum_wr           0
momentum_roc          0
momentum_ao           0
momentum_ppo_hist     0
trend_cci             0
trend_aroon_up        0
trend_aroon_down      0
trend_aroon_ind       0
vol_ma30             29
anomaly               0
target                0
dtype: int64

Data Reduction

Remove unnecessary columns

Remaining features: 30

Correlation Analysis

Correlation Matrix of Selected Features

There is no highly correlated features

Data-Transformation

Feature skewness before transformation:
vol           2.260647
amount        2.817781
volume_obv    2.174151
volume_vpt    0.949351
dtype: float64

We can see from the output, vol, amount, volume_obv is highly right skewed, and volume_vpt is a little right skewed. We can apply log transformation.

Feature Engineering

Creating Lag Features

To capture predictive patterns leading up to volatility events, we create lagged versions of key indicators. This allows the model to detect precursor signals 1-3 days before anomalies.

These lagged features serve as candidate leading indicators, designed to capture anomaly signals up to 3 days ahead of their occurrence.

Creating Rolling Statistics

Rolling window statistics help capture evolving market conditions and short-term trends that may precede volatility events.

Interaction Features

We create interaction terms between key indicators that financial theory suggests may combine to signal impending volatility.

Feature Importance

We use mutual information to identify the most predictive features for our anomaly target.

Top 20 features by mutual information:
['log_amount', 'log_vol', 'high', 'volume_vwap', 'open', 'low', 'volatility_atr_lag1', 'trend_macd', 'volatility_atr', 'log_volume_vpt_ma5', 'volatility_atr_ma10', 'volatility_atr_lag2', 'close', 'trend_cci', 'volatility_atr_lag3', 'momentum_rsi_lag2', 'volatility_ui', 'rsi_vol_interaction', 'log_volume_vpt', 'pct_chg']


Data Prepossessing